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[Abstract] Constraints on the mass and volume that can be allocated for electronics 
spares and repair equipment on long-duration space missions mean that NASA must look at 
repair strategies beyond the traditional approach, which has been to replace faulty 
subsystems in a modular form, termed Orbital Replacement Units or Line Replacement 
Units. Other possible strategies include component and board-level replacement, modular 
designs that allow reprogramming of less-critical systems to take the place of more critical 
failed systems, and a blended approach which uses elements of each of these approaches, 
along with a limited number of Line Replacement Units. This paper presents some of the 
constraints and considerations that affect the decision on how to approach electronics repair 
for long duration space missions, and discusses the benefits and limitations of each of the 
previously mentioned strategies. 


I. Introduction 

N ASA’s plans for long duration missions to the Moon and, especially, to Mars require a much greater degree of 
self sufficiency on the part of the crew than ever before. Such missions will have greatly reduced logistic 
support from Earth, compared to International Space Station (ISS) operations. Returning to Earth in the event of an 
emergency may not be an option either, as a lunar return flight could require two to three days, and a Martian return 
flight will require much longer. 

One area of mission support that NASA must plan for is electronic repairs. Despite the rigorous testing required 
by NASA, electronics faults have already occurred in both Space Shuttle and ISS operations 1 ’ 2 , leading to the use of 
backup systems or loss of capability to some degree. While the electronics and other systems used in a long 
duration space mission will undergo rigorous testing, the crew of such missions will most likely encounter an 
electronics failure at some point in the mission. With the design of the Crew Exploration Vehicle (CEV) already 
beginning and likely influencing the design of future spacecraft and hardware, it is not too early for NASA to begin 
exploring and designing techniques and tools for crew members conducting electronics repair during long duration 
space mission. These considerations include system design (for accessibility, parts type and sizes, and board 
complexity), repair infrastructure (including diagnostic capabilities, tools, and other needed equipment), and 
logistics constraints. The decisions on how to approach each of these considerations depends on the overall repair 
strategy that is chosen. 

The current repair scheme used for the ISS focuses on orbital replacement units (ORU). An ORU is a modular 
electronics assembly or system, designed to be replaced as a unit. When an electronic fault is detected, it is isolated 
to specific ORU through system performance information. The crew then removes the entire ORU, replacing it with 
a spare. Most replacement ORUs are flown from the ground as needed, with limited spares on hand. Preliminary 
designs for the CEV use a similar design for electronic systems, also referred to as line replacement units (LRU ) . 

Previous work regarding electronics repairs during missions has focused largely on the economics of performing 
repairs, in terms of crew time, upmass, and volume of spares. Accola et. al. 4 examined the implications of crew 
members performing varying amounts of repair onboard the then-proposed Space Station Freedom. Their model 
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predicted that increasing crew performed repairs could save on mass and volume of spares compared to reduced 
levels of repair, up to eliminating repairs altogether and simply replacing faulty LRUs with good ones. These 
repairs cost a small amount of crew time, with this cost more than offset by volume and mass considerations. 

NASA researchers have also begun studies of soldering electronics components in reduced gravity. These 
studies include a demonstration of soldering wires in orbit 5 as well as many studies of soldering aboard a reduced 
gravity aircraft laboratory 6 7 8 9 10 These later studies show an increase in solder joint void fraction in reduced 
gravity compared to solder joints formed in normal gravity. An increase in joint void fraction indicates a greater 
risk for the joint to fail over time; these results show a need for NASA to explore methods of mitigating the increase 
in void fraction. 

This paper will discuss the factors that must be considered when choosing a repair strategy for long-duration 
space missions, and several potential repair strategies are examined. The relative advantages and disadvantages of 
each approach are discussed, as well as suggestions for a ‘blended’ approach, in which elements of each may be the 
optimal solution. 


II. Factors Affecting Electronics Repair During Space Missions 

A number of factors must be considered when selecting a strategy for electronics repair during a mission, 
particularly since each of these factors needs to be addressed during the design of the spacecraft and the mission 
profile. For the purpose of this paper, these factors are defined as Electronic System Design (which includes design 
considerations at the board level and subsystem level). Diagnostics, Repair Infrastructure (which includes the tools 
and equipment needed to perform the repairs, as well as the logistics of the spares stores) and Crew Considerations. 
While the details on each of these topics are beyond the scope of this paper, each will be discussed in a high-level 
fashion in the following subsections. 


A. Electronic System Design 

The design of the electronics for a long-duration mission is one of the first factors to be considered, since these 
designs can enable or prevent elements of the different repair strategies, and decisions regarding the electronic 
systems design are likely to be made early in the overall spacecraft design process. With regard to repair 
considerations, electronic systems must first be designed for accessibility, meaning the crew member must have 
access to the system in order to repair it. While this may seem self-evident, volume and other design constraints 
may make a significant portion of the subsystems inaccessible to crew members from inside the craft. If this is the 
case, then either robotic or EVA (Extra Vehicular Activity) provisions must be made to allow access. 

If the selected repair strategy involves component-level (or board-level) repair, then the concept of accessibility 

needs to be extended down several levels further. Individual boards 
(within the LRU or subsystem) must be accessed and disconnected, 
and, if components are to be replaced, then the conformal coating on 
the board must be removable. While most coatings can be removed 
with various methods in ground-based facilities, limitations on 
chemicals and equipment that can be flown make confonnal coating 
removal an accessibility issue that must be considered. 

The strategy of component-level repair, as well as the degree to 
which that strategy is implemented will also require consideration in 
the design of circuit boards and the choice of components used. If 
manually-operated repair tools (such as a soldering iron) are used, 
limits will exist on what components can be repaired. One example of 
such a component is the ‘ball grid array’ configuration of an 
integrated circuit (Figure 1), in which the leads are not accessible to a 
soldering iron. Other factors, such as the pitch of the leads, may also 
make manual repairs intractable in some cases. While some of these 
limitations may be addressed using other repair technologies (such as 
a semi-automated repair station), such limitations may be minimized 
or avoided through careful design. 

side of the component. 
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Figure 1: Example of ball-grid array 
(EGA) electronic component, showing 
the array of contacts on the bottom 
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B. Diagnostics 

Regardless of the repair philosophy chosen, the problem remains that diagnostics must be included to determine 
where the problem lies in a faulty system. At a high level, the first layer of diagnostics will include isolation of the 
problem to a specific subsystem, or LRU. This may be through symptomatic diagnosis, or through system-wide 
levels of built-in tests (BIT). 

If LRU replacement is chosen, this level of diagnostics is sufficient. If component-level repair is to be 
implemented, then further diagnostics are needed to pinpoint which component is faulty. A trade study completed 
by the authors 11 determined that diagnostics needed for component-level repair are likely to include non- functional 
test equipment, which tests components in a non-powered state and compares their signatures against known good 
data. These diagnostics will also be needed for post-repair verification, prior to returning a subsystem to service. 
Additionally, crew members will require standard diagnostic tools such as oscilloscopes, multimeters, etc., to 
properly locate a faulty component. Diagnostic capabilities also require some levels of crew training, and may also 
include a ground support via telemetry. 

C. Repair Infrastructure Requirements 

Within the scope of this paper, repair infrastructure is defined as the set of tools and spare parts required to 
implement the repair. The tools requirements will vary, depending on the strategy chosen. Most strategies are likely 
to entail at least a limited set of small hand tools, for disconnecting fasteners, etc. Component-level repair will also 

require some specialized tools, such as 
a soldering iron or other soldering 
technology, and specialized tools for 
component removal. If a semi- 
automated component-level repair is 
implemented, then an X-Y positioning 
stage (Figure 2) is also needed. 

The logistical aspect of spares 
represents a key element of the 
infrastructure requirement, with wide 
variations between the requirements 
for LRU replacement and component- 
level repair. Careful choices must be 
made to insure that adequate quantities 
of spares are available, in the event of 
multiple failures. The issue of mass 
and volume required to provide that 
number of spares is likely to be a 
central issue when deciding on a repair 
strategy. 



Figure 2: Example of a semi-automated, electronics diagnostic 
system with X-Y translation stage. 


D. Crew Considerations 

An additional factor in the selection of a repair strategy is the skill and training requirements on the crew, as well 
as time required to implement the various repair strategies. Each of the different repair strategies will require 
significantly different specific skills and training to implement the repairs, although some of the skill requirements 
may be partially mitigated through the use of a ‘tele-science’ approach, in which specific portions of the diagnosis of 
the problem are handled in conjunction with ground-based engineers. Up-linked training materials on specific repair 
issues can also help minimize the need to pre-train the crew for every specific incident, though a baseline of 
knowledge and experience will clearly be needed. 

Crew time required to perform a repair will also be a relevant consideration. Critical systems may (in some 
circumstances) require repair strategies which can be implemented very quickly, while other repairs may not be 
time-critical. Crew time is always considered a high-value resource, but the value of the crew time must be weighed 
against all of the other considerations when examining the choice of repair strategy. Clearly, less crew time is 
required for LRU replacement, but this may be overridden by the need to reduce the volume of spares for the 
mission. 
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III. Possible Strategies for Electronics Repair 

The general strategies for addressing electronic failures during a mission can be classified broadly as: 

1) LRU Replacement: This approach involves removing and replacing modular subsystems, which are 
comprised of ‘blocks’ of circuit boards. 

2) Component-Level Repair: This approach entails repair of individual faulty components on a circuit 
board 

3) System Re-Programming / “Scavenging”: This approach involves the use of “common modules” (to 
the degree possible) which can be removed from service on less crucial systems, and re-programmed to 
replace critical systems that have undergone failure. 

4) Combined Approach: This approach uses a blended combination of each of the three distinct 

strategies, in an attempt to provide an optimal repair/maintenance solution. 

Additionally, the ‘component-level repair’ strategy can be divided into manually performed repairs (i.e., hand 
tools only), and semi-automated repair (with mechanized aids for diagnostics, component placement, and soldering). 
Each of these approaches has distinct advantages and disadvantages, in terms of system design requirements, the 
diagnostics and repair infrastructure needed, and crew considerations. The following sections examine some 
possible implementations of these strategies, with discussion of the benefits and limitations of each approach. 

1. LRU Replacement 

LRU’s are modular subassemblies that generally contain a series of related circuit boards, designed to 
collectively perform a specific function (or multiple functions) within a larger system. These units are packaged for 
relatively easy installation and removal, and require comparatively low crew time and training to use. 

A representative range in LRU spatial dimensions is from about 3” x 3” x 3” to 28” x 17” x 7”, with mass 
ranging from approximately 1 lb to 75 lbs. Figure 3 shows an example of a typical LRU 12 . When an electronic 
system fails, the fault is isolated to a specific LRU, using system performance observations and some level of built- 
in diagnostics. The crew member then removes the entire LRU and replaces it with a spare or relies on a backup 

system. The faulty LRU then returns to Earth with 
the shuttle, where it may be repaired, tested and 
returned to a pool of spares awaiting return to space. 

The use of LRU’s affords several important 
benefits. Since the unit can be thoroughly tested 
prior to flight, the reliability of LRU replacement as 
a repair strategy is very high. Also, the requirement 
on crew training and experience is minimal, and the 
LRU’s can generally be replaced fairly quickly and 
with minimal tools. 

However, the LRU replacement strategy carries 
a high penalty in terms of mass and volume, 
considering the likely requirements of multiple 
spares per system. Furthermore, long-duration 
missions may require travel times that prohibit re- 
supply from Earth, particularly Martian missions. 
The mission must bring along all potentially-needed 
spares with them. Given the complexity of a long 
duration spacecraft, the number of spares required 
for adequate mission safety for each LRU is 
probably prohibitive. 



Figure 3: Example of an LRU (Electrical Power and 
Control Unit, referred to as EPCU) 12 . 


2. Component-Level Repair 

While LRU replacement approaches electronic repair from the standpoint of subsystem replacement, component- 
level repair addresses the problem at the lowest possible level. This approach offers the greatest savings in terms of 
mass and volume of spares, it does require at least some additional tools and infrastructure (such as additional 
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diagnostics, beyond the BIT diagnostics already used in LRU’s). Further, crew training becomes much more 
important, and time requirements to perform the repairs increase. 

NASA has already begun investigation of component-level electronics repair, with the CLEAR project (standing 
for Component-Level Electronic-Assembly Repair) 13 . This effort, which is a combination of trade studies, normal 
gravity demonstrations, and reduced gravity testing aboard aircraft facilities and the ISS, is examining the feasibility 
of manually-operated, component-level repairs (i.e., with a soldering iron) for long-duration missions. Early efforts 
have shown the general feasibility of electronics repair in this environment, building on previous research done on 
the effects of the reduced gravity environment on the soldering process 5 6 " 10 . 

An extension of that work would be to perform component-level repair using a semi-automated re-work station, 
as previously noted. Such stations use computer controlled translation stages and machine vision to position 
diagnostics or soldering/de-soldering devices on faulty components whose size or configuration would not typically 
permit manual repair. This capability would allow repair of a broader class of electronic components, as well as 
advanced soldering technologies such as infrared re-flow, or convective heating re-flow. Systems of this type would 
allow inclusion of more components in the “repairable” categoiy than manual repair (thereby further reducing the 
need for spare LRU’s), and would help to minimize the need for crew experience in manual repairs. 

3. System Re-Programming / Scavenging 

The notion of re -programming or scavenging systems to repair or work around failures is a strategy that could 
have several embodiments. One version of this involves the use of existing, reprogrammable resources to reprogram 
faulty subsystems and work around damaged components. Although there are several methods to implement this, 
field programmable gate array (FPGA) devices can be used as an example. With an FPGA, the actual 
configuration is typically loaded into the FPGA after power-up. The configuration can be stored in non-volatile 
memory and could be changed. For example, if the failed subsystem were a power supply or power conditioning 
module, re -programming could allow supply voltages to be programmed to different settings. The same approach 
can be taken at the LRU level, where sub-systems could be comprised of items that could be transferred to different 
applications. This could include computer cards, motion amplifiers/controllers, etc., with the different functionality 
being determined by the FPGA code that is loaded. 

Another strategy for recovering from electronic failures would be to scavenge existing parts or subsystems to 
repair a faulty circuit board, or work around a fault. With scavenging techniques, crew members would remove 
components from a good board in another system deemed less crucial to operations and crew well-being than the 
system with the damaged circuit. Crew skill and time requirements for this approach are not very different from the 
component-level repair plan already discussed. This strategy, though, will require considerable work prior to 
launch. Mission designers and planners will have to compile a list of components available that is easily accessible 
to the crew or ground support teams, use common components across all circuit boards to the greatest degree 
possible, and prioritize systems and circuit boards to aid in deciding which circuit board to scavenge. 

4. Combined Approach 

The discussions of the strategies listed in this paper may seem to suggest that these strategies are exclusive of 
each other, when the most practical approach is likely to be a combination of each of these. For example, a 
comparatively small number of spare LRU’s may be required for critical systems, so that the functional system 
could be brought back on-line while the faulty component is found and repaired in the original unit. Less critical 
systems, in certain areas, may be addressed using a scavenging and re -programming approach, with the faulty LRU 
being repaired, then re-installed in place of the scavenged LRU. An additional implementation could involve board- 
level replacement within the LRU for components that are not easily repairable given the infrastructure or tools that 
are available. This approach is a compromise between true component-level repair and full LRU replacement; while 
it does require more spare storage space than true component-level repair, it requires considerably less space than 
that of full LRU replacements. A combined strategy of component-level repair on some (significant) fraction of 
parts, with non-repairable parts being replaced at the circuit-card level, along with a minimal number of LRU spares, 
and LRU’s that are designed (where possible) for re -programming and re-use, would allow the greatest flexibility in 
repairing a range of faults while still minimizing the logistics footprint required, to the degree possible. 


FPGA’s are programmable hardware devices, in which the internal hardware configuration is controlled via 
software, allowing implementation of custom logic circuits. 
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IV. Discussion 

The previously described strategies represent three distinct approaches to electronics repair on long-duration space 
missions. Each carries certain advantages over the others, along with corresponding disadvantages. For the purpose 
of this discussion, comparisons will be made in two broad areas; ease of repair / extent of crew training required, 
and volume and mass savings. A major assumption in this discussion is that all of the three methods are in fact 
practical, and can be implemented effectively from an operational standpoint. In practice, only the LRU 
replacement method has been previously implemented, and therefore fully proven. While early results show 
significant promise, component-level repair is a strategy still being investigated by NASA 13 . 

Ease of Repair and Extent of Crew Training 

The first area of comparison is ease of repair and minimizing training, repair, and practice time for the crew. 
Crew training and repair time is an important consideration because of the extensive amount of training for all 
aspects of a mission prior to launch and the planned scientific, housekeeping, and maintenance exclusive of repair 
during a mission. For these reasons, plans that reduce the training and repair workload for the crew are typically 
given strong consideration by NASA planners. 

Replacing LRUs has advantages over the other repair plans in terms of required crew training time, practice 
time, and time spent performing a repair. Replacing a LRU requires little training beyond familiarity with electrical 
and mechanical connectors for the LRUs, which is required for any repair scenario. Replacement strategies can also 
use a standardized set of procedures, which also simplifies the repair process. Replacing LRUs also limits the 
amount of ground support required for the repair. Ground support teams will most likely receive BIT results 
regardless of the repair plan used, and need only advise crew members as to which LRU to replace, where to find the 
replacement and stow the faulty unit, and any specific procedures for replacing the LRU in question. 

The other repair plans require more crew training and time than simply replacing the LRU. Reprogramming 
FPGA or similar devices requires training to open an LRU, find the faulty component, and load a new program. 
This also requires extensive diagnostic testing, and involvement with a ground support team to properly diagnose the 
fault, and to possibly send new code for the failed device from Earth. Scavenging and replacing a failed component 
may face many of the same ground support requirements as reprogramming components - aid in diagnosis and 
planning and implementing a repair. These processes also require extensive crew training prior to the flight. This 
training includes soldering techniques with the available tools, performing diagnostic testing and some interpretation 
of the results, and techniques for handling a circuit board and components beyond soldering component leads. Crew 
members may also require practice in soldering techniques to retain competency in performing repairs. 

Volume and Mass Savings 

Saving on the volume and mass of electronic repair tools and parts, as with any other system carried into space, 
is of critical importance. Mass and volume savings in one area translates to allowances in other areas. Repairing a 
board requires some volume and mass, both in tools and spare parts. This plan requires storing spare circuit boards 
and spare components, as well as providing diagnostic and repair tools. Scavenging and reusing or reprogramming 
components saves mass and volume by reducing the amount of spare circuit boards and components, but does not 
eliminate the need for diagnostic and repair tools. Replacing LRUs, though, is likely to require much more volume 
and mass, compared to the other methods. While replacement does not require the same level of diagnostic and 
repair tools that may be required by the other methods, it does require full-sized LRUs to be stored. Given that an 
entire LRU typically contains multiple boards, as well as a framework and external housing, it is clear that replacing 
an entire LRU is less efficient, in terms of storage of spares, than the other methods. This is particularly significant 
when considering the need for multiple spares, to allow multi-fault tolerance (i.e., multiple or repeated failures or 
damage to the same system). 


Design and Support Considerations 

For the implementation of any level of component or board level repair strategy to be successful, systems would 
need to be designed so that “non-repairable” components were segregated, to the greatest degree possible, on 
smaller ‘daughter’ cards, allowing a larger number of smaller cards to be carried. Further, the circuit boards and 
LRUs must be designed for repair, which includes allowing accessibility to the boards, use of common components 
when possible, board layouts that reduced the difficulty of performing a repair, common LRU and board interfaces 
where possible, and extensive BIT to reduce the diagnostic tasks left to a crew member. These techniques will 
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reduce the crew workload during a repair, and the amount of training time necessary for crew competence. Ground 
support teams who are able to examine diagnostic test results (with engineering models of electronics on hand) can 
greatly aid in the diagnosis and repair of faulty electronics. Finally, one or more crew members should be 
knowledgeable in the field of electronics. This does not mean that a crew member should be an electronics engineer 
or technician, but should be trained in some electronics repair similar to the U.S. Navy 2M program 14 , or similar 
program focused on the types of repairs and components expected during a space mission. 


V. Conclusions 

This paper examines strategies for the repair of faulty electronics in long duration space missions. The 
discussion first focused on factors affecting the crew’s ability to perform these repairs. These factors include 
designing the electronics so they can be repaired during a mission, the diagnostics required to both determine the 
source of a fault and to test a repair before returning the system to use, and crew considerations including training, 
experience, and ground support through telescience. The paper then discusses three general strategies for 
performing repairs during a mission: replacing a LRU, repairing components on circuit boards within a faulty LRU, 
and reprogramming or scavenging parts from a less critical LRU for a faulty, but necessary, LRU. Finally, the 
authors recommend a mix of replacing entire LRUs when the system is a critical component, having spare circuit 
boards for circuits requiring complex repair techniques, and spare components for LRUs that are not mission critical 
and do not require difficult repair procedures. Combined, these methods reduce the volume and mass required for 
spare parts and increase the capabilities and increase the likelihood of overall mission success. 

While the development and implementation of such a repair capability will require effort and expenditure, the 
cost of failing to provide a well-planned repair capability could be catastrophic, both economically and in human 
terms, if the failure to make such repairs were to lead to mission failure and loss of life. When considered in those 
terms, the efforts needed to explore, develop, and implement appropriate repair strategies to allow sustainable and 
supportable long-duration missions don’t appear nearly as costly. 
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